NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learned Offline Query Planning via Bayesian Optimization

https://doi.org/10.1145/3725316

Tao, Jeffrey; Maus, Natalie; Jones, Haydn; Zeng, Yimeng; Gardner, Jacob R; Marcus, Ryan (June 2025, Proceedings of the ACM on Management of Data)

Analytics database workloads often contain queries that are executed repeatedly. Existing optimization techniques generally prioritize keeping optimization cost low, normally well below the time it takes to execute a single instance of a query. If a given query is going to be executed thousands of times, could it be worth investing significantly more optimization time? In contrast to traditional online query optimizers, we propose an offline query optimizer that searches a wide variety of plans and incorporates query execution as a primitive. Our offline query optimizer combines variational auto-encoders with Bayesian optimization to find optimized plans for a given query. We compare our technique to the optimal plans possible with PostgreSQL and recent RL-based systems over several datasets, and show that our technique finds faster query plans.
more » « less
Full Text Available
Physical Visualization Design: Decoupling Interface and System Design

Chen, Yiru; Li, Xupeng; Tao, Jeffrey; Ramjit, Lana; Mitra, Subrata; Ghaderi, Javad; Netravali, Ravi; Parameswaran, Aditya; Rubenstein, Dan; Wu, Eugene (June 2025, ACM)

Interactive visualization interfaces enable users to efficiently explore, analyze, and make sense of their datasets. However, as data grows in size, it becomes increasingly challenging to build data interfaces that meet the interface designer’s desired latency expectations and resource constraints. Cloud DBMSs, while optimized for scalability, often fail to meet latency expectations, necessitating complex, bespoke query execution and optimization techniques for data interfaces. This involves manually navigating a huge optimization space that is sensitive to interface design and resource constraints, such as client vs server data and compute placement, choosing which computations are done offline vs online, and selecting from a large library of visualization-optimized data structures. This paper advocates for a Physical Visualization Design (PVD) tool that decouples interface design from system design to provide design independence. Given an interfaces underlying data flow, interactions with latency expectations, and resource constraints, PVD checks if the interface is feasible and, if so, proposes and instantiates a middleware architecture spanning the client, server, and cloud DBMS that meets the expectations. To this end, this paper presents Jade, the first prototype PVD tool that enables design independence. Jade proposes an intermediate representation called Diffplans to represent the data flows, develops cost estimation models that trade off between latency guarantees and plan feasibility, and implements an optimization framework to search for the middleware architecture that meets the guarantees. We evaluate Jade on six representative data interfaces as compared to Mosaic and Azure SQL database. We find Jade supports a wider range of interfaces, makes better use of available resources, and can meet a wider range of data, latency, and resource conditions.
more » « less
Full Text Available
Tyche: Making Sense of Property-Based Testing Effectiveness

https://doi.org/10.1145/3654777.3676407

Goldstein, Harrison; Tao, Jeffrey; Hatfield-Dodds, Zac; Pierce, Benjamin C; Head, Andrew (October 2024, ACM)

Software developers increasingly rely on automated methods to assess the correctness of their code. One such method is property-based testing (PBT), wherein a test harness generates hundreds or thousands of inputs and checks the outputs of the program on those inputs using parametric properties. Though powerful, PBT induces a sizable gulf of evaluation: developers need to put in nontrivial effort to understand how well the different test inputs exercise the software under test. To bridge this gulf, we propose Tyche, a user interface that supports sensemaking around the effectiveness of property-based tests. Guided by a formative design exploration, our design of Tyche supports developers with interactive, configurable views of test behavior with tight integrations into modern developer testing workflow. These views help developers explore global testing behavior and individual test inputs alike. To accelerate the development of powerful, interactive PBT tools, we define a standard for PBT test reporting and integrate it with a widely used PBT library. A self-guided online usability study revealed that Tyche’s visualizations help developers to more accurately assess software testing effectiveness.
more » « less
Full Text Available
Physical Visualization Design: Decoupling Interface and System Design

https://doi.org/10.1145/3725334

Chen, Yiru; Li, Xupeng; Tao, Jeffrey; Ramjit, Lana; Mitra, Subrata; Ghaderi, Javad; Netravali, Ravi; Parameswaran, Aditya; Rubenstein, Dan; Wu, Eugene (June 2025, Proceedings of the ACM on Management of Data)

Interactive visualization interfaces enable users to efficiently explore, analyze, and make sense of their datasets. However, as data grows in size, it becomes increasingly challenging to build data interfaces that meet the interface designer's desired latency expectations and resource constraints. Cloud DBMSs, while optimized for scalability, often fail to meet latency expectations, necessitating complex, bespoke query execution and optimization techniques for data interfaces. This involves manually navigating a huge optimization space that is sensitive to interface design and resource constraints, such as client vs server data and compute placement, choosing which computations are done offline vs online, and selecting from a large library of visualization-optimized data structures. This paper advocates for a Physical Visualization Design (PVD) tool that decouples interface design from system design to provide design independence. Given an interfaces underlying data flow, interactions with latency expectations, and resource constraints, PVD checks if the interface is feasible and, if so, proposes and instantiates a middleware architecture spanning the client, server, and cloud DBMS that meets the expectations. To this end, this paper presents Jade, the first prototype PVD tool that enables design independence. Jade proposes an intermediate representation called Diffplans to represent the data flows, develops cost estimation models that trade off between latency guarantees and plan feasibility, and implements an optimization framework to search for the middleware architecture that meets the guarantees. We evaluate Jade on six representative data interfaces as compared to Mosaic and Azure SQL database. We find Jade supports a wider range of interfaces, makes better use of available resources, and can meet a wider range of data, latency, and resource conditions.
more » « less
Full Text Available
DIG: The Data Interface Grammar

https://doi.org/10.1145/3597465.3605223

Chen, Yiru; Tao, Jeffrey; Wu, Eugene (June 2023, ACM)

Building interactive data interfaces is hard because the design of an interface depends on the data processing needs for the underlying analysis task, yet we do not have a good representation for analysis tasks. To fill this gap, this paper advocates for a Data Interface Grammar (DIG) as an intermediate representation of analysis tasks. We show that DIG is compatible with existing data engineering practices, compact to represent any analysis, simple to translate into an interface design, and amenable to offline analysis. We further illustrate the potential benefits of this abstraction, such as automatic interface generation, automatic interface backend optimization, tutorial generation, and workload generation.
more » « less
Full Text Available
XRP: In-Kernel Storage Functions with eBPF

Zhong, Yuhong; Li, Haoyu; Wu, Yu Jian; Zarkadas, Ioannis; Tao, Jeffrey; Mesterhazy, Evan; Makris, Michael; Yang, Junfeng; Tai, Amy; Stutsman, Ryan; et al (July 2022, Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation)

With the emergence of microsecond-scale NVMe storage devices, the Linux kernel storage stack overhead has become significant, almost doubling access times. We present XRP, a framework that allows applications to execute user-defined storage functions, such as index lookups or aggregations, from an eBPF hook in the NVMe driver, safely bypassing most of the kernel’s storage stack. To preserve file system semantics, XRP propagates a small amount of kernel state to its NVMe driver hook where the user-registered eBPF functions are called. We show how two key-value stores, BPF-KV, a simple B+-tree key-value store, and WiredTiger, a popular log-structured merge tree storage engine, can leverage XRP to significantly improve throughput and latency.
more » « less
Full Text Available
XRP: In-Kernel Storage Functions with eBPF

Zhong, Yuhong; Li, Haoyu; Wu, Yu Jian; Zarkadas, Ioannis; Tao, Jeffrey; Mesterhazy, Evan; Makris, Michael; Yang, Junfeng; Tai, Amy; Stutsman, Ryan; et al (July 2022, 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22))

Full Text Available

Search for: All records